Predicting Frailty and Geriatric Interventions in Older Cancer Patients: Performance of Two Screening Tools for Seven Frailty Definitions—ELCAPA Cohort

Simple Summary Screening tools have been developed to identify patients warranting complete geriatric assessment (GA). However, GA lacks standardization and does not capture important aspects of geriatric oncology practice such as actual treatment decisions based on GA findings, expert-based clinical classifications, and/or broader approaches to frailty. We compared the diagnostic performance of screening tools G8 and modified G8 according to: (1) the detection of ≥1 or (2) ≥2 GA impairments, (3) the prescription of ≥1 geriatric intervention and identification of an unfit profile according to (4) a latent class typology, expert-based classifications from (5) Balducci, (6) the International Society of Geriatric Oncology task force (SIOG), and (7) a GA frailty index according to the Rockwood accumulation of deficits. Our findings support the clinical value of the original and modified G8 for detecting a variety of health profiles evocative of frailty in older cancer patients, with evidence of better diagnostic performance of the modified G8 than that of the original G8. Abstract Screening tools have been developed to identify patients warranting a complete geriatric assessment (GA). However, GA lacks standardization and does not capture important aspects of geriatric oncology practice. We measured and compared the diagnostic performance of screening tools G8 and modified G8 according to multiple clinically relevant reference standards. We included 1136 cancer patients ≥ 70 years old referred for GA (ELCAPA cohort; median age, 80 years; males, 52%; main locations: digestive (36.3%), breast (16%), and urinary tract (14.8%); metastases, 43.5%). Area under the receiver operating characteristic curve (AUROC) estimates were compared between both tools against: (1) the detection of ≥1 or (2) ≥2 GA impairments, (3) the prescription of ≥1 geriatric intervention and the identification of an unfit profile according to (4) a latent class typology, expert-based classifications from (5) Balducci, (6) the International Society of Geriatric Oncology task force (SIOG), or using (7) a GA frailty index according to the Rockwood accumulation of deficits principle. AUROC values were ≥0.80 for both tools under all tested definitions. They were statistically significantly higher for the modified G8 for six reference standards: ≥1 GA impairment (0.93 vs. 0.89), ≥2 GA impairments (0.90 vs. 0.87), ≥1 geriatric intervention (0.85 vs. 0.81), unfit according to Balducci (0.86 vs. 0.80) and SIOG classifications (0.88 vs. 0.83), and according to the GA frailty index (0.86 vs. 0.84). Our findings demonstrate the robustness of both screening tools against different reference standards, with evidence of better diagnostic performance of the modified G8.


Introduction
In order to detect health problems in older patients with cancer and accordingly tailor treatment decisions, multidimensional geriatric assessment (GA) is recommended [1]. Because GA is time-consuming and requires specific expertise for its conduction, screening tools have been developed to help identify potentially frail patients warranting complete GA, following a two-step approach to be used particularly in clinical settings where performing GA is not feasible for all older patients. However, we have no unique definition of what this population constitutes, and what the reference standard should be. Although two main approaches have been proposed to define frailty, namely, the cumulative deficit model described by Rockwood et al. [2], and the physical phenotype developed by Fried [3], there is currently no consensus and no broadly accepted standard for measuring frailty in older cancer patients. Several classifications, usually based on clinical expertise and professional consensus, were used, but their concordance was variable, with different patients identified as frail depending on the used criteria [4].
In the geriatric oncology setting, a pragmatic definition based on ≥1 abnormal GA test assessing important domains at the GA (i.e., functional status, comorbidity, cognition, mood, nutrition) has mostly been used for developing and validating screening instruments [5][6][7], but this approach is hampered by a lack of standardization of GA components across studies. This definition also does not capture important aspects of the reality of clinical practice in geriatric oncology, such as actual treatment decisions based on GA findings, expert-based clinical classifications, and/or broader approaches to frailty.
The G8 [5] and modified G8 [6] screening tools were specifically developed for older patients with cancer using the definition of ≥1 GA impairment as the reference standard.
The original G8 was compared with the GA in 16 studies [5,6,[8][9][10][11][12][13][14][15][16][17][18][19][20][21] of older patients with cancer; 7 studies used a cutoff for impairment of ≥1 GA deficiencies, reporting sensitivity ranging from 65% to 90% and mean specificity of 55% (range 3% to 100%). Twelve studies reported results using a cutoff for impairment of ≥2 GA deficiencies, with sensitivity ranging from 38% to 97% and specificity from 29% to 79%. Another study [16] of patients with hematologic disorders used Fried's criteria [3] to assess the performance of the G8, reporting results of similar magnitude, with sensitivity of 82% and specificity of 51%. Outside the oncological setting, only few studies have evaluated screening tools against definitions other than the GA [22,23]. To our knowledge, no other reference standard has been tested with the G8. With the aim of improving the performance of the original G8, which is among the most sensitive tools but lacks specificity, the modified G8 was developed, achieving both appropriate sensitivity and specificity for predicting an abnormal GA. No other study has reported on the diagnostic performance of the modified G8 using reference standards other than an abnormal GA result. We, therefore, aimed to measure and compare the diagnostic performances of the original G8 versus the modified G8 using six other classifications evocative of a state of frailty.

Study Design and Patients
We studied patients recruited between January 2007 and June 2015 from the ELCAPA prospective cohort study, for whom complete data on the six reference standards and screening G8 scores were available (n = 1136, Figure 1). Consecutively enrolled patients ≥70 years old with a diagnosis of solid cancer or hematologic malignancy were referred for GA to one of ten geriatric oncology clinics in teaching hospitals in the Paris urban area. The study was approved by the institutional review board of the Henri-Mondor Teaching Hospital, Creteil, France, and each patient provided written informed consent before inclusion. All research was performed in accordance with relevant guidelines and regulations. The survey is registered at ClinicalTrials.gov (NCT02884375; accessed on 3 January 2022).

Study Design and Patients
We studied patients recruited between January 2007 and June 2015 from the ELCAPA prospective cohort study, for whom complete data on the six reference standards and screening G8 scores were available (n = 1136, Figure 1). Consecutively enrolled patients ≥70 years old with a diagnosis of solid cancer or hematologic malignancy were referred for GA to one of ten geriatric oncology clinics in teaching hospitals in the Paris urban area. The study was approved by the institutional review board of the Henri-Mondor Teaching Hospital, Creteil, France, and each patient provided written informed consent before inclusion. All research was performed in accordance with relevant guidelines and regulations. The survey is registered at ClinicalTrials.gov (NCT02884375; accessed on 3 January 2022).

Reference-Standard Definitions
The term "reference standard" is used to describe the best available method for establishing the presence or absence of the condition of interest [24], and thus constitutes the ultimate measure for comparing new diagnostic or screening tests in testing accuracy studies. However, this situation assumes that an established reference standard is available and has perfect accuracy, which is not always the case. Reference-standard tests for many diseases may be difficult to implement because of their invasiveness, or may lack 100% accuracy or a clear cutoff value for the reference standard. In other cases, no unequivocal definition is available for the target condition, which prevents the characterization of a clear and definite reference standard.
In accordance with our research objectives, a wide spectrum of definitions was thus considered to check the ability of screening tools to identify a state evocative of frailty. The following reference-standard definitions evocative of a state of frailty were tested: (1) detection of ≥1 or (2) ≥2 impaired components of the GA, (3) prescription of ≥1 intervention by the geriatrician and identification of an unfit profile as defined by (4) a latent class typology (LCT) approach [25], (5) expert-based classifications from Balducci [26] and (6) the International Society of Geriatric Oncology task force (SIOG) classification [27], or using (7) a GA frailty index according to Rockwood accumulation of deficits principles [2].

Reference-Standard Definitions
The term "reference standard" is used to describe the best available method for establishing the presence or absence of the condition of interest [24], and thus constitutes the ultimate measure for comparing new diagnostic or screening tests in testing accuracy studies. However, this situation assumes that an established reference standard is available and has perfect accuracy, which is not always the case. Reference-standard tests for many diseases may be difficult to implement because of their invasiveness, or may lack 100% accuracy or a clear cutoff value for the reference standard. In other cases, no unequivocal definition is available for the target condition, which prevents the characterization of a clear and definite reference standard.
In accordance with our research objectives, a wide spectrum of definitions was thus considered to check the ability of screening tools to identify a state evocative of frailty. The following reference-standard definitions evocative of a state of frailty were tested: (1) detection of ≥1 or (2) ≥2 impaired components of the GA, (3) prescription of ≥1 intervention by the geriatrician and identification of an unfit profile as defined by (4) a latent class typology (LCT) approach [25], (5) expert-based classifications from Balducci [26] and (6) the International Society of Geriatric Oncology task force (SIOG) classification [27], or using (7) a GA frailty index according to Rockwood accumulation of deficits principles [2].

Geriatric Assessment
The GA included a variety of domains covering functional status, mobility, nutrition, cognition, mood, and comorbidities used in the development of the modified G8 screening tool [6] and in accordance with international recommendations [28]. Domains (CIRS-G; abnormal if ≥1 comorbidity grade 3 or 4). Considered thresholds were ≥1 and ≥2 impaired components.

Geriatric Interventions
For each patient, proposed geriatric interventions after GA were documented. After internal review by two expert geriatricians (ML, PC) and for the present analysis, the final recommendation of the geriatrician for adapting the anticancer treatment was considered, as well as four domains covering clinically relevant deficiencies that may warrant further geriatric interventions: nutritional, home, neuropsychological, and social support. A consideration of ≥1 of these interventions prescribed by the geriatrician was defined as the reference standard.

Frailty Classifications
Four classifications were considered to approach the non standardized definition of frailty: the Balducci and SIOG classifications, the LCT and a GA-derived frailty index, using the "unfit" profiles as reference standards.
Details regarding specific indicators and measures considered to classify patients as fit or unfit (regrouping the categories of vulnerable or frail or too sick) are given in Supplementary Tables S1 (Balducci and SIOG classifications), S2 (LCT), and S3 (GA frailty index).
According to the Balducci classification derived from the criteria described by Balducci et al. [26,29], and as implemented in Ferrat et al. [4], fit patients were defined as functionally independent (no dependence in ADL and IADL), without serious comorbidity (retaining CIRS-G grade 0, 1, or 2 for the present analysis) and without geriatric syndromes, and unfit patients as dependent in one or more ADL (≤5/6) and/or one or more IADL (≤7/8) and/or with one or more severe comorbidities (retaining CIRS-G grade 3 or 4 for the present analysis) and one or more geriatric syndromes (Table S1). Similarly to Ferrat et al. [4], considered geriatric syndromes included dementia, delirium, depression, urinary and/or fecal incontinence, and falls (≥1 fall in the last 6 months); three geriatric syndromes also used in Balducci, namely, osteoporosis, neglect and abuse, and failure to thrive, were not available in our database and were thus disregarded.
According to the SIOG classification [27], fit patients were defined as having no serious comorbidity (CIRS-G grade 0, 1 or 2), functionally independent (no dependence in IADL and ADL), and not malnourished and unfit patients as dependent in one or more ADL (≤5/6) or IADL (2 categorizations considered to define impairment: ≤7/8 for all patients; ≤7/8 for women and ≤3/4 for men) and/or with one or more severe comorbidities (CISR-G grade 3 or 4) and/or malnutrition (Table S1). The original definition for malnutrition was not available in our database, so we used the following substitute, according to French guidelines [30]: ≥5% of weight loss in the last month and/or ≥10% within the last 6 months instead of ≥5% during the previous 3 months.
Additionally, we considered an LCT developed in a population of older patients with cancer, combining components of the GA [25]. Scoring equations were based on a set of indicators and covariates (Table S2) yielding posterior class membership probabilities for each patient. A patient was classified as "fit" if the probability of membership in class 1 (relatively healthy) was ≥50% and unfit if the probability was <50%.
Lastly, a frailty index was constructed according to the cumulative deficit model developed by Rockwood et al. [2] and following recommendations for constructing a frailty index from Searle et al. [31]. A frailty index was derived from GA findings, considering 52 health deficits to be combined into a global index ranging from 0 to 1, where 0 corresponds to no deficit being present, and 1 to all 52 deficits being present (Table S3)

Screening Tools
The G8 screening tool includes 8 items (Table S4). Total scores range from 0 to 17, a cutoff score of ≤14 defined as abnormal [5]. The modified G8 tool includes 6 items (Table S5). Total scores range from 0 to 35, and a cutoff score of ≥6 was defined as abnormal [6].

Statistical Analysis
Sample size calculation was based on our preliminary work [7] evaluating the performance of the G8 and modified G8 in identifying older cancer patients likely to have an abnormal GA, estimating areas under the receiver operating characteristic curve (AUROC) at 86.5% and 91.6%, respectively. On the basis of a comparison of AUROC between the two instruments at two-sided 5% alpha risk, and considering an expected frailty prevalence of 90%, we calculated that the inclusion of at least 838 patients would yield a statistical power of 80% to identify a minimal difference in AUROC of 4% [32,33].
The study population was described in terms of clinical and demographic characteristics, and GA results. Univariate logistic regression analyses were used to assess the associations between the reference standards and both screening tools, estimating unadjusted odds ratios (ORs) and 95% confidence intervals (CIs). We tested for the equality of the regression coefficients for both tools. AUROC estimates were calculated to compare the diagnostic performance of both screening tools against the reference standards, with 95% CIs estimated. We tested for the equality of the AUROC values with an algorithm suggested by DeLong and Clarke-Pearson [34] for comparing both tools. We additionally investigated whether a different cutoff value provided a better discriminative performance for each reference standard. Sensitivities and specificities were calculated for optimal cutoff values (those prioritizing sensitivity), along with their 95% CIs, and were compared by McNemar's chi-squared test. Positive predictive values (PPV), negative predictive values (NPV), positive likelihood ratios (LR+), and negative likelihood ratios (LR−) were additionally calculated. All tests were two-tailed, and the significance threshold was P < 0.05. All analyses used Stata v13 (StataCorp, College Station, TX, USA).

Patient Characteristics and Geriatric Interventions
Main patient characteristics and GA results are in Table 1. Median age was 80 years (interquartile range (IQR) 76-85). The most frequent cancers were those of the digestive system (36.3%), followed by breast cancer (16%), and urinary tract cancer (14.8%), with almost half of the patients presenting metastasis (43.5%). A loss of functional capacities was common, with 31.6% and 58.5% of patients having at least one impairment in Activities of Daily Living (ADL) and Instrumental ADL (IADL), respectively. An impaired Mini Nutritional Assessment (MNA) was identified in 64.3% of patients. The burden of comorbidities was high, with 63.1% of patients having at least one comorbidity of severity grade 3 or 4 according to the Cumulative Illness Rating Scale for Geriatrics (CIRS-G) criteria.
A median of 3 interventions (IQR 2-5) were proposed for each patient. The most frequent intervention concerned nutritional support (74.5%) ( Table 2). Physiotherapy and social support were proposed for 63.8% and 63.5% of patients, respectively. Overall, at least one intervention was proposed by the geriatrician for 91% of patients (N = 1032).   9). b Defined as absence of a primary caregiver or adequate support at home or a strong circle of family and friends able to meet the needs of the patient at time of evaluation. c One or more of: at least 10% weight loss in 6 months or 5% in 1 month and/or body mass index < 21 kg/m 2 and/or MNA score < 17/30 and/or serum albumin level < 35 g/L. d Weight loss ≥ 10% in the last 6 months and/or ≥5% in the last month.

Prevalence of Unfit Patients by Reference Standard
The proportion of patients classified as unfit according to the reference standards varied (Tables 3 and 4

Predictive Performance of Screening Tools by Reference Standard
On univariate logistic regression analyses (Table 3), abnormal G8 scores were significantly associated with all reference standards, regardless of definition. Nutritional support had the strongest association among types of interventions (G8: OR 8.6, 95% CI 5.9-12.6; modified G8: OR 9.1, 95% CI 6. [3][4][5][6][7][8][9][10][11][12][13]. Area under the receiver operating characteristic curve (AUROC) values were ≥0.80 for both tools and all tested definitions (Figure 2). It was significantly higher for the modified G8 for six of the seven tested reference standards: ≥1 GA impairment (modified G8  Table 4 details the diagnostic performance of each tool for the seven tested reference standards. Sensitivities based on optimal cutoffs ranged from 82% (original G8) and 84% (modified G8) for the Balducci classification, to 91% (both tools) for ≥1 GA impairment, with significant differences found for ≥2 GA impairments, Balducci and SIOG classifications, and GA frailty index in favor of the modified G8. Most specificities were higher for the modified G8, ranging from 41% (≥1 geriatric intervention prescribed) to 63% (GA frailty index) for the original G8 and from 56% (≥1 geriatric intervention prescribed) to 75% (≥1 GA impairment) for the modified G8. With the exception of the LCT, positive predictive values (PPVs) were higher, and negative predictive values (NPVs) were lower for the modified G8 compared to the original G8, with better LRs for the modified G8.
When considering the IADL four items for men and eight items for women in the definition of the SIOG classification (N = 1102), results were similar to those for the eightitem IADL for all patients. Sensitivities and specificities were 86.7% and 59.9% for the original G8, and 89.6% and 67.8% for the modified G8. AUROC values were significantly higher for the modified G8 than those of the original G8: 0.90 (95% CI 0.87-0.92) vs. 0.85 (0.82-0.87); p < 0.00001.

Discussion
In the present study of older cancer patients, we assessed the diagnostic performance of the original and modified G8 tools against different reference standards to evaluate their robustness. We tested seven reference standards evocative of a geriatric risk profile. Regardless of the tested reference standard, both tools demonstrated high predictive value and performance robustness to detect various definitions evocative of frailty. Statistically significant differences in AUROCs favored the modified G8 over the original G8 for ≥1 and ≥2 GA impairments, a geriatric intervention, the Balducci and SIOG classifications and the GA frailty index, demonstrating better screening performance of the modified G8 (six of the seven tested reference standards). AUROC findings for both the G8 and modified G8 predicted subsequent prescription of geriatric interventions for relevant clinical domains. This finding is of particular clinical relevance because it relates directly to the main objective of the screening tools to identify patients who would benefit from a complete GA. Beyond conceptual pitfalls to define frailty, this finding further supports the pragmatic aim of the G8 instruments to adequately detect patients with potential deficits warranting interventions and the optimization of treatments.
The G8 and modified G8 screening tools were originally developed to identify patients with ≥1 impairment in a multidimensional GA, proposed by the SIOG [1] as the reference standard for evaluating older cancer patients to determine the optimal oncologic treatment. However, a standardized definition of GA and, more importantly, an abnormal GA is lacking. Indeed, the definition of an abnormal GA varies greatly across studies, which may use a different number of components, and different scales and thresholds for defining impairment, hence limiting the comparability of study results [7]. Furthermore, this pragmatic definition most often used in the literature does not correspond well to the reality of clinical practice of geriatricians and oncologists, having limited applicability and representing a problem for implementation in routine clinical care.
Other frailty classifications have been developed to help physicians select the best cancer treatment and guide geriatric interventions. In a recent study, the prognostic value of three classifications (Balducci, SIOG, and the LCT) was assessed and found to be good for 1-year mortality and 6-month unscheduled hospitalizations in older cancer patients [4], supporting their use to stratify older cancer patients according to their health status for clinical decision making, and also as a candidate reference definition for screening-test accuracy studies because of their predictive value for patient outcomes [35]. For example, some frailty criteria have been used to help evaluate the toxic effects of treatments.
On the basis of our results, the modified G8 seemed to be an appropriate tool to identify several profiles suggesting frailty, regardless of the definition that has been continuously debated over the past decades.
The present study is the first to thoroughly examine the variability of the diagnostic performance of screening tools for frailty in older patients with cancer under multiple clinically relevant reference standards or definitions. Adding to the previously reported high prognostic value of the two instruments [36], our findings reinforce the clinical utility of the G8 tools in daily geriatric oncologic practice.
Our study has some limitations. First, patients from our study population were referred to a geriatrician for a GA with varying and sometimes limited rates of tumor locations (e.g., 7%, 11%, and 16% for hematological, prostate, and breast cancers, respectively); our results may thus not completely reflect the real-life population of older patients with cancer. Second, data were missing for some key variables to compute G8 scores and/or reference standards, although missing rates per variable were overall low (median 7%, range 0-17.6%). Patients were also excluded from analysis when data on any of the six reference standards were not available to allow for the direct comparison of the screening tools using a common population. We found no statistically significant difference between included and excluded patients in demographic and clinical characteristics. In addition, there would have been interest in assessing other reference standards, such as the Fried phenotype [3], a well-established instrument developed for the general geriatric population, but not specifically for older patients with cancer.
Further studies are necessary to corroborate our findings and to evaluate the predictive value of both tools for other relevant outcomes, namely, functional decline, treatmentrelated toxicity, and quality of life. In particular, it would be of special interest to determine if geriatric management integrating the G8 or the modified G8 ultimately improves patient health outcomes.

Conclusions
Our findings demonstrate the robustness of the original and modified G8 against different reference standards, further supporting the clinical value of these instruments for detecting older patients with cancer who warrant a complete GA. The modified G8 demonstrated better diagnostic performance than that of the original G8 for detecting a variety of health profiles evocative of frailty. Our findings may offer a practical response for daily practice with an instrument able to detect any potential risk problem regardless of definition.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/cancers14010244/s1, Table S1: Frailty classification approaches ("fit" vs. "unfit") and indicators according to the Balducci and International Society of Geriatric Oncology (SIOG) task force classifications, Table S2: Variables and definitions used to classify patients in the latent class typology, Table S3: Variables included in the Frailty Index (N = 52 health deficits included), Table S4: G8 screening questionnaire, Table S5: Modified-G8 screening questionnaire.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
Restrictions apply to the availability of these data. Data were obtained from the ELCAPA Study Group and are available from the corresponding author with the permission of the ELCAPA Study Group investigators.